大型並行處理器程式設計：實務導向課程：GPU 計算的起源

GPU 的誕生是一次根本性的轉變，其動力來自於 「即時性要求」：必須在 $1/60^{th}$ 秒（16.67毫秒）內完成複雜 3D 场景的渲染，這項要求不容妥協。儘管 CPU 遵循著多核心發展路徑以低延遲串列執行為優化目標，但隨著解析度提升，性能便遇到瓶頸。

在 90 年代中期，遊戲產業面臨危機。單線程的 CPU 在處理人工智慧與物理運算時，無法快速計算出數百萬個像素值，以致畫面流暢度無法維持。這迫使業界開發專用硬體，以卸下重複性的 圖形管线。

在內部並行陣列出現之前，3dfx 引入了 掃描線交錯技術（SLI）。透過兩張實體顯示卡交替計算水平掃描線，產業重心從單一執行緒速度轉移到原始的「暴力運算」吞吐量。

GPU 的設計理念將矽晶面積優先分配給簡單的算術單元，而非複雜的分支預測。這種「寬而慢」的哲學，使 GPU 能夠處理三角形的重複數學運算，同時讓 CPU 傾向於處理非平行化的邏輯。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the specific 'time budget' required for 60 frames per second (FPS)?

33.33ms

16.67ms

10.00ms

100.00ms

QUESTION 2

How did 3dfx's SLI achieve early parallelism in consumer hardware?

By increasing the clock speed of a single chip.

By having two cards render alternating horizontal scan lines.

By sharing AI logic between the GPU and CPU.

By reducing the resolution of the frame.

QUESTION 3

Why did the GPU diverge from the standard multicore trajectory of CPUs?

GPUs needed deeper caches for complex branching.

GPUs prioritize throughput of simple math over low-latency serial logic.

CPUs became too expensive to manufacture for 3D graphics.

GPU architectures were designed to be smaller than CPUs.

QUESTION 4

In the context of 1990s gaming, what was the 'Real-Time Imperative'?

The requirement to run physics simulations on the GPU.

Processing millions of pixels within the strict frame window.

The transition from 16-bit to 32-bit computing.

Allowing the CPU to handle rasterization.

QUESTION 5

What is meant by the GPU's 'Wide and Slow' philosophy?

Using many simple processors at lower clock speeds to do massive work.

Designing physically wide chips that take longer to process data.

A design that favors high latency but high memory capacity.

Optimizing for single-threaded serial logic.